Conversation
Entity operations triggered from orchestrations don't propagate distributed trace context because SendEntityMessageAction lacks a parentTraceContext field. This causes entity operations to appear disconnected from their parent orchestration traces. Changes: - Vendor updated proto with parentTraceContext field on SendEntityMessageAction (see microsoft/durabletask-protobuf#64) - Set ParentTraceContext in ProtoUtils.ConstructOrchestratorResponse for entity message actions (matching existing behavior for tasks and sub-orchestrations) - Add unit tests for entity trace context propagation Note: A corresponding server-side change is also needed in the Durable Task Scheduler backend to propagate SendEntityMessageAction.ParentTraceContext to OperationRequest.traceContext on entity work items. Fixes #653
There was a problem hiding this comment.
Pull request overview
This PR fixes a distributed tracing gap where entity operations dispatched from orchestrations lacked trace context propagation. Without ParentTraceContext on SendEntityMessageAction, entity spans appeared as disconnected root traces in distributed trace viewers (e.g., Aspire Dashboard, Jaeger) rather than as children of the orchestration that triggered them.
Changes:
- Added
parentTraceContextfield (field 5) to theSendEntityMessageActionproto message, consistent withScheduleTaskActionandCreateSubOrchestrationAction - Set
ParentTraceContext = CreateTraceContext()inProtoUtils.ConstructOrchestratorResponsefor entity message actions, matching the existing behavior for activities and sub-orchestrations - Added unit tests verifying trace context propagation for signal/call entity, null-activity fallback, non-entity-converted actions, and unique span ID generation
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/Grpc/orchestrator_service.proto |
Adds parentTraceContext field (index 5) to SendEntityMessageAction |
src/Shared/Grpc/ProtoUtils.cs |
Sets ParentTraceContext on entity message actions; incidental character encoding change in unrelated comments |
test/Worker/Grpc.Tests/ProtoUtilsTraceContextTests.cs |
New test class covering trace context propagation for entity operations |
…quest Add parentTraceContext fields to EntityOperationSignaledEvent and EntityOperationCalledEvent proto messages, and extract them in ToEntityBatchRequest to populate OperationRequest.TraceContext. This is the receiving side of the trace context propagation: once the DTS backend populates parentTraceContext on entity operation events (from SendEntityMessageAction.ParentTraceContext), the SDK will correctly extract and use the trace context for entity operations.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
You can also share your feedback on Copilot code review. Take the survey.
| EntityUnlockSentEvent entityUnlockSent = 4; | ||
| } | ||
|
|
||
| TraceContext parentTraceContext = 5; |
There was a problem hiding this comment.
Note to self: don't merge changes to this file until properly vendored (requires microsoft/durabletask-protobuf#64 to be merged).
Restore original LF line endings that were inadvertently converted to CRLF when editing on Windows. Only the actual code changes remain in the diff.
sophiatev
left a comment
There was a problem hiding this comment.
Looks good, can you just add a screenshot of an example trace for posterity?
Create trace spans for entity operation execution in OnRunEntityBatchAsync, using the trace context propagated from the parent orchestration. This completes the distributed tracing story for entities by making entity operations visible in trace viewers. - Add EntityOperation constant to TraceActivityConstants - Add StartTraceActivityForEntityOperation to TraceHelper - Create span with entity metadata tags in GrpcDurableTaskWorker.Processor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/Worker/Grpc/GrpcDurableTaskWorker.Processor.cs:933
- The entity trace activity is disposed via the
usingstatement afterCompleteEntityTaskAsync()completes, which means the span duration will include the network round-trip time of sending the response. In contrast,OnRunActivityAsyncexplicitly callstraceActivity?.Stop()beforeCompleteActivityTaskAsync()to avoid including the network transmission latency in the measured span (see the comment at line 853: "Stop the trace activity here to avoid including the completion time in the latency calculation"). The entity handler should follow the same pattern and explicitly stop the activity before callingCompleteEntityTaskAsync.
P.EntityBatchResult response = batchResult.ToEntityBatchResult(
completionToken,
operationInfos?.Take(batchResult.Results?.Count ?? 0));
await this.client.CompleteEntityTaskAsync(response, cancellationToken: cancellation);
You can also share your feedback on Copilot code review. Take the survey.
Add Client spans (emitted during orchestration replay) and Server spans (emitted during entity execution) for entity operations. This creates proper Client->Server span nesting in trace viewers, matching the existing pattern used by activities and sub-orchestrations. - TraceHelper: Add EmitTraceActivityForEntityOperationCompleted/Failed and StartTraceActivityForSchedulingEntityOperation (Client spans) - Processor: Handle EntityOperationCompleted/Failed in newEvents switch with GetEntityOperationCalledEvent helper for request correlation - Include before/after screenshots showing entity traces in Aspire Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Screenshots are provided separately in the PR description. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
You can also share your feedback on Copilot code review. Take the survey.
- Add Schema.Task.EntityOperationName constant for entity operation tag - Replace magic string 'durabletask.entity.operation' with constant - Combine nested if statements for ParentTraceContext null check - Extract entity name from TargetInstanceId (format '@name@key') - Add comment explaining Server span trace context for entity batches Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each entity operation in a batch may come from a different orchestration with its own trace context. Create individual Server spans per operation instead of a single span for the whole batch, so each operation's span is correctly parented under its corresponding Client span. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
You can also share your feedback on Copilot code review. Take the survey.
- Replace foreach loop with LINQ Select/OfType/ToList for trace activities - Mark trace activities with Error status in entity-not-found path - Use non-nullable List<Activity> initialized via LINQ expression Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@cgillum What's the story with entity lock events wrt. distributed tracing? Do we currently emit spans/traces for locking? Is it worth adding here if not? |
@andystaples I considered Lock requests as out-of-scope for this PR since it's not directly applicable to entity execution traces. However, I think it's a good future work item for us to consider since it would be helpful for users to be able to see when locks are added/removed. |
| string targetInstanceId = historyEvent?.EntityOperationCalled?.TargetInstanceId ?? string.Empty; | ||
|
|
||
| // Extract entity name from instance ID (format: "@name@key") | ||
| string entityName = targetInstanceId; |
There was a problem hiding this comment.
I believe we have APIs to do this in EntityInstanceId if I'm not mistaken?
| calledEvent.ParentTraceContext?.TraceState, | ||
| out ActivityContext parentContext)) | ||
| { | ||
| newActivity.SetSpanId(parentContext.SpanId.ToString()); |
There was a problem hiding this comment.
Do you happen to know why we use Activity.Current.Context as the parent trace context but manually set the span ID here? Why aren't we setting the parent trace context to just be the parentContext extracted here?
I see this is done for the other methods here as well but I forget (or perhaps never understood) why
| FailureDetails = new FailureDetails(frameworkException), | ||
| }; | ||
|
|
||
| traceActivities.ForEach(a => a.SetStatus(ActivityStatusCode.Error, frameworkException.Message)); |
There was a problem hiding this comment.
Again just a question for my own understanding - in this case and the case above, we will just end up with Activities with just server spans and no corresponding client span?
| newEvent.TimerFired); | ||
| break; | ||
|
|
||
| case P.HistoryEvent.EventTypeOneofCase.EntityOperationCompleted: |
There was a problem hiding this comment.
Where is the logic for emitting client spans for an entity signal request? Maybe I'm missing something but looks like this is just for entity calls
There was a problem hiding this comment.
I just remembered that @philliphoff added an extensive TracingIntegrationTests class under Grpc.Integration.Tests. It would be good to give copilot the task of adding tests for the entities cases as well
Summary
Entity operations triggered from orchestrations don't propagate distributed trace context because
SendEntityMessageActionlacks aparentTraceContextfield, and entity operation events (EntityOperationSignaledEvent,EntityOperationCalledEvent) also lack this field. This causes entity operations to appear as disconnected root spans rather than being properly nested under their parent orchestration in distributed tracing tools.Changes
Proto updates (see microsoft/durabletask-protobuf#64)
parentTraceContextfield toSendEntityMessageAction(field 5)parentTraceContextfield toEntityOperationSignaledEvent(field 6) andEntityOperationCalledEvent(field 8)Trace context propagation (ProtoUtils.cs)
ParentTraceContextonSendEntityMessageActioninConstructOrchestratorResponse, matching the existing pattern for tasks and sub-orchestrations. UsesCreateTraceContext()which generates a randomclientSpanIdfor Client→Server span pairing.parentTraceContextfromEntityOperationSignaledEvent/EntityOperationCalledEventinToEntityBatchRequest, populatingOperationRequest.TraceContext.Entity Server spans (TraceHelper.cs, Processor.cs)
StartTraceActivityForEntityOperation()— creates a Server span during entity execution, parented under the Client span via the propagated trace context.OnRunEntityBatchAsyncusing the first operation's trace context.Entity Client spans (TraceHelper.cs, Processor.cs)
EmitTraceActivityForEntityOperationCompleted()andEmitTraceActivityForEntityOperationFailed()— emit Client spans during orchestration replay when entity operation results arrive.StartTraceActivityForSchedulingEntityOperation()— creates the Client span and usesSetSpanId()to match theclientSpanIdfromParentTraceContext, creating the Client→Server nesting.EntityOperationCompletedandEntityOperationFailedevents in the orchestratornewEventsprocessing loop.GetEntityOperationCalledEvent()helper for correlating completion events with their original called events viarequestId.Schema and constants
EntityOperation = "entity_operation"toTraceActivityConstants.csEntityOperationName = "durabletask.entity.operation"toSchema.csTests
How it works
Entity operations now follow the same Client→Server span pair pattern as activities and sub-orchestrations:
CreateTraceContext()generates a randomclientSpanId→ stored inSendEntityMessageAction.ParentTraceContextParentTraceContextto both the entity message event and the orchestrator history eventOnRunEntityBatchAsynccreates a Server span using the propagated trace context (clientSpanId becomes the parent)EntityOperationCalledEventfrom history →StartTraceActivityForSchedulingEntityOperationcreates a Client span withSetSpanId()matching theclientSpanIdScreenshots
For this fix, we used a Durable Agent scenario to validate. The source code is here. TL/DR, it's an orchestration that calls two entities and then an activity function. The activity function call was needed so that we could compare the spans but also to ensure we didn't regress activity behavior (an earlier version of this change did regress activities a bit):
Before
Here's what the scenario looked like prior to the fix.
Notice the entity traces don't appear at all. They can be seen as top-level unparented traces (not shown).
After
Here's what the scenario looked like after the fix.
🎉
Merge note
Related PRs
ParentTraceContexton entity history eventsFixes #653